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data from the Missouri Assessment Program to examine effects of using the 
read-aloud accommodation on the characteristics of multiple-choice math and 
reading test items administered to approximately 3200 students in 3rd and 4th 
grades. Four subject groups were defined: non- accommodated students with no 
disabilities; non- accommodated students with no disabilities matched in test 
performance to students with a reading disability; non- accommodated students 
with a reading disability; and students with a reading disability who took 
the test with the read-aloud accommodation. Differential item function (DIF) 
analysis was used to determine numbers of difficult items for each group. 
Significantly more DIF items were identified for students receiving the read 
aloud accommodation than for similarly disabled students tested without the 
accommodation, suggesting that the read-aloud accommodation is ineffective 
for students with reading disabilities or that a reading test measures the 
reading construct differently for students with a reading disability than for 
students without disabilities. Appendices present detailed test results. 
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Executive Summary 






About 50% of students with a disability participate in the Missouri assessment program with 
one or more testing accommodations. One of the most prevalent and controversial accommoda- 
tions is an audio presentation of written test material; for instance, having a proctor read the 
math test items and response options to the student. There is a dearth of empirical evidence that 
demonstrates how audio presentation of test material, heretofore referred to as the read-aloud 
accommodation, affects the construct the test was designed measure. This study uses actual test 
administration data from the Missouri Assessment Program to examine the effect of using the 
read-aloud accommodation on the characteristics of multiple-choice math and reading test items 
administered to students in 3rd and 4th grade. 

One of the criticisms of using extant data is that it is difficult to isolate the effect of an accom- 
modation because the presence of an accommodation is confounded with other student charac- 
teristics as well as other accommodations. In this study we defined four groups to help control 
for such confounding effects. Group A represented a random sample of non-accommodated 
students with no disabilities; Group B was a control group of non-accommodated students with 
no disabilities who were matched in overall test performance to students with a reading disabil- 
ity; Group C represented students whose primary disability was in reading and who took the 
test without an accommodation; Group D represented students whose primary disability was in 
reading and who took the test with the read-aloud accommodation. 

Differential item function (DIF) analysis was run on the data using BILOG-MG, which com- 
pared item difficulty estimates across several groups simultaneously. The item difficulty esti- 
mates for Groups B, C, and D were compared to the estimates for Group A to examine DIF. 
Because an accommodation is considered a way of improving the quality of measurement for 
students with a disability, it would be expected that the number of DIF items would be greatest 
for Group C and that there would be few DIF items for Group D. Among the 32 multiple -choice 
math test items, there was one DIF item for Group C and six DIF items for Group D. Among the 
41 multiple-choice reading items, there were 10 DIF items for Group C and 19 DIF items for 
Group D. 

These findings raise some important questions. First, how can we determine who should ben- 
efit from an accommodation? It makes sense to assume that students with a reading disability 
should benefit from a read-aloud accommodation, yet there were fewer DIF items for those 
students with a reading disability who did not get the accommodation than for those who did. 
Replication is needed to confirm this result. Second, it would appear that a reading test mea- 
sures the reading construct differently for students with a reading disability than for students 
without disabilities, whether or not the students received the read-aloud accommodation. With- 
out the accommodation, 25% of the items were identified as DIF items, and with the accommo- 
dation about 50% of the items were identified as DIF items. More research is needed to deter- 
mine the best ways to measure reading skills for students who struggle with reading. 
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Overview 



States continue to struggle with the best ways to include students with disabilities in state testing 
programs. The menu of testing options available for students with disabilities includes taking 
the regular assessment without accommodations, taking the regular assessment with 
accommodations, or taking a different assessment. Many students with disabilities take the 
regular assessment under standard conditions, but a large percent take the regular assessment 
under accommodated conditions (Thurlow, House, Boys, Scott, & Ysseldyke, 2000). Because 
the validity evidence for a test is usually tied to an administration without accommodations, it 
is necessary to gather additional validity evidence for test scores obtained under accommodated 
conditions. With federal legislation requiring states to report disaggregated results for students 
with disabilities, the need for additional validity evidence is even greater. 

In the research literature on test accommodations, the term accommodation refers to that subset 
of test changes that do not alter the construct the test was designed to measure. Alterations that 
change the construct of the test are sometimes called modifications, although some states use 
the term “modification” for changes considered appropriate (Thurlow, & Weiner, 2000). 
Generally, an accommodation can be defined as an alteration to the standard test conditions that 
neutralizes extraneous sources of difficulty resulting from an interaction between standard 
administration and a student's disability while preserving the measurement goals of the test 
(Elliott, Kratochwill, McKevitt, Schulte, Marquart, & Mroch, 1999; Phillips, 1994; Tindal, 
Helwig, & Hollenbeck, 1999; Willingham, Ragosta, Bennett, Braun, Rock, & Powers, 1988). 
Validation studies are necessary to ascertain the appropriateness of an alteration. Unfortunately, 
the evidence substantiating the validity of an accommodation is scant, even for some of the 
most widely used alterations, such as the read-aloud administration of a test. 

Two types of validation evidence for testing accommodation can be derived from the definition 
of test accommodations. One type of evidence is the test score boost. If accommodations remove 
extraneous sources of difficulty resulting from an undesirable interaction between standard 
testing conditions and a student’s disability, then a valid accommodation should result in a 
performance boost for students with that disability. Because a valid accommodation acts only 
on the interaction between disability and standard testing conditions, it should not result in a 
test score boost for students without a disability (Phillips, 1994). Such test score boosts represent 
necessary but insufficient data to conclude that an accommodation is valid. 

A valid test accommodation should preserve the measurement goals of the test as well as result 
in a performance boost. In other words, it should not alter the construct the test was designed to 
measure. There are several ways to ascertain whether a set of test items measures the same 
construct for different examinee groups. One way is to apply differential item function analysis, 
known as DIF. DIF analysis represents a way to compare item characteristics such as item 
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difficulty across groups. Under the notion of DIF, groups are first equated on ability, which is 
usually estimated by each person's overall test score. Presumably, if all of the items measure 
the same construct in the same way for all groups, then each item should be equally difficult 
across the ability-matched groups. Finding that many of the items display DIF would be an 
indication that the test is measuring something different across groups. 

DIF analysis requires large samples, at least several hundred cases per group. Most experimental 
studies contain fewer than 100 participants. State testing databases contain thousands of cases, 
making it possible to conduct DIF analysis. In addition to containing large samples, such databases 
represent the real-life testing situations. It is essential that test accommodations demonstrate 
validity in the real testing environment. Extant databases represent a wonderful source of 
information for ascertaining the validity of testing accommodations in natural settings. 

There are some shortcomings to evaluating the validity of testing accommodations using extant 
data. Foremost is the fact that there may not be naturally occurring appropriate control groups. 
Students with disabilities who take a test with an accommodation differ from students without 
disabilities who do not use accommodations in three common ways: (1) they have a disability, 
(2) they took the test with an accommodation, and (3) as a group their overall performance is 
usually (but not always) much lower than their peers without disabilities. In DIF studies that do 
not account for each of these differences, it would not be appropriate to attribute DIF to the 
presence of an accommodation because the accommodation is confounded with other factors. 
One possible solution is to define successive groups wherein one group differs from the reference 
group only in terms of performance, another that differs in terms of performance and presence 
of a disability, and one that differs in terms of performance, presence of a disability, and presence 
of an accommodation. Although it is not possible to ensure that these groups differ only in the 
ways indicated because students are not randomly assigned to groups, the three layers of control 
groups do make it possible to begin to isolate the effect of the test accommodation on item 
functioning. However, even if DIF is attributed in this way to the presence of a testing 
accommodation, one cannot conclude that the accommodation itself is invalid. 

There are three possible explanations for why differential item functioning might exist. It may 
be that the accommodation simply does not do what proponents think it should do. In other 
words, the accommodation may change the construct that the test was designed to measure. 
Another possibility is that the accommodation was not appropriately administered. For example, 
an accommodation that requires a proctor to read math test items may be ineffective because 
the proctor read the items too quickly. A third possibility is that the accommodation may have 
been administered to students who do not actually need it. Research on how accommodations 
decisions are made indicates that the decision makers, members of the EEP team, tend to over- 
accommodate students (Elliott et al., 1999; Fuchs , Kratochwill, McKevitt, Schulte, Marquart, 
& Mroch 2000; Fuchs, Eaton, Hamlett, & Kams, 2000). In other words, they use accommodations 
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that have no benefit for the student. Regardless of knowing the reason why, it is still important 
to ascertain whether accommodated test scores measure the same construct as non-accommodated 
test scores. 

The read-aloud accommodation is one of the most commonly used accommodations (Bielinski, 
Ysseldyke, Bolt, Friedebach, & Fredebach, 2001). Prior to this study, only two studies of the 
read-aloud accommodation used test data from an actual statewide administration of an 
achievement test to evaluate the validity of the accommodation. Both of those studies used 
structural equation models to evaluate factor invariance between the scores of non-accommodated 
students and the scores of students receiving the read-aloud accommodation. Tippets and 
Michaels (1997) studied the effect of the read-aloud accommodation on a reading test and a 
language usage test. The tests were part of the Maryland School Performance Assessment 
Program. Pomplun and Omar (2000) studied the effect of the read-aloud accommodation on a 
math test that was part of the Kansas Assessment Program. Each study reported that the factor 
structure was the same for the non-accommodated and the accommodated groups. Both studies 
fitted a two-factor model to the item sets because a two-factor model resulted in a significant 
improvement in model fit. However, the reality is that scores from these assessments are based 
on a single latent trait; therefore, their findings cannot be generalized to the scores actually 
reported in those states. 

The method used in this study produces results that demonstrate whether test items function the 
same under non-accommodated and accommodated conditions when a single latent trait model 
is used. In this way, the results can indicate the validity of the test scores reported in the assessment 
system when the scores were obtained under accommodated conditions. 

The read-aloud accommodation, as with any accommodation, should result in scores that are 
more meaningful for students who need the accommodation as compared to the scores obtained 
by similar students taking the test without the accommodation. Tindal, Heath, Hollenbeck, 
Almond, and Hamiss (1998) emphasized the importance of the area of student need as defined 
by the student’s Individualized Education Program (IEP) for making the determination about 
which accommodations to give; students with the same primary IEP area probably have a common 
need and should be given the same accommodation. In the Tindal et al. (1998) study, the effect 
of the read-aloud accommodation on the construct validity of a reading and a math test was 
evaluated by comparing item difficulty invariance across groups. One group included students 
whose primary IEP area was reading and who received the read-aloud accommodations; another 
group also had reading as their primary IEP area but did not receive an accommodation; the 
other two groups consisted of students without disabilities taking the test without an 
accommodation. 
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Research Questions 



Presumably, all students whose primary IEP area is reading should benefit from the read-aloud 
accommodation. The performance of students with a reading disability not receiving the read- 
aloud accommodation should be affected by extraneous sources of difficulty that alter item 
characteristics. Therefore, the item characteristics based on the students who received the 
accommodation should closely resemble the item characteristics in the comparison group of 
non-disabled students, whereas the item characteristics for students with a reading EEP not 
receiving the accommodation should differ markedly from those in the comparison group. 

The research questions for this study are based on the preceding rationale. The two specific 
research questions are: 

• Is item difficulty the same for students receiving the read-aloud accommodation as for 
non-accommodated students without disabilities? 

• Does item difficulty markedly differ for students who need the read-aloud accommodation, 
but did not receive it when compared to the non-accommodated students without 
disabilities? 



Participants 

This study uses data from the 1998 administration of the Missouri Assessment Program (MAP). 
All public school students are required to participate in MAP. Students may participate in the 
program by taking the regular assessments without accommodations, the regular assessments 
with accommodations, or an alternate assessment. For students with disabilities, the decision 
about how the student should participate is usually made by the student’s IEP team. The students 
and their parents are encouraged to participate in the decision process. 

When students with disabilities take the test, their test forms are marked to indicate (1) the 
primary area of their special instruction (e.g., reading, math, behavior), adding with their category 
of disability. In addition, the accommodations used by students are indicated. The focus of the 
study is on those students who primarily receive special education services for reading instruction. 
For each examinee, the proctor indicated the student's primary IEP instructional area. Only 
those students for whom the primary instructional area was reading were used in the two special 
education groups defined below. 



In 1998, there were 52,387 third grade students with a valid communications arts (reading) test 
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score, and 66,800 fourth graders with a valid math test score. Students receiving special education 
services constituted 1 1.4% (N=5,962) of the population taking the communication arts test, and 
12.7% (N=8,491) of the population taking the mathematics test. From these data, four groups 
of students were defined. Group A represented a random sample of approximately 1000 general 
education students who took the test without an accommodation. Group B represented a random 
sample of approximately 1000 general education students who took the test without an 
accommodation and who were matched in ability to the group of students with an IEP in reading. 
The selection of the sample of cases for Group B was done so that the distribution of the number- 
correct scores on the multiple-choice items matched the pooled distribution of the number- 
correct scores for groups C and D. Group C represented all of the students whose primary IEP 
instructional area was reading and who took the test without an accommodation. Group D 
represented all of the students whose primary IEP instructional area was reading and who took 
the test under the read-aloud accommodation (either alone or in combination with extended 
time, small group administration, or both). Most of the students receiving the read-aloud 
accommodation also received extra time and took the test in a small group. 

Table 1 is a summary of group performance on the each test measured as the number of the 
multiple-choice items answered correctly. Groups B, C, and D have similar means and standard 
deviations on both the reading test and the mathematics test. The mean number-correct score 
for Group A was about one standard deviation unit greater than the mean score in the other three 
groups on both tests. The ratio of males to females in groups C and D was about 2 to 1 . These 
are the groups comprised of students with an IEP in reading. This ratio is similar to the ratio of 
males and females in special education. Additionally, just over 90% of the students in each 
group had a learning disability (LD). Groups A and B consisted of nearly equal numbers of 
males and females. 



Instruments 

The reading and mathematics tests used in the Missouri Assessment Program (MAP) consist of 
a combination of norm-referenced test items developed by CTB-McGraw/Hill and items 
developed expressly to measure Missouri's Show Me Standards. Tests were divided into three 
sections referred to as sessions. Session 1 consisted of a combination of performance events 
and constructed response items. Session 2 consisted of only constructed response items, and 
Session 3 consisted of only multiple-choice items. The Session 3 items were adapted from the 
Terra Nova™ to meet Missouri’s achievement standards and they are included to provide national 
normative comparisons. The present study used only the multiple-choice items. 

The reading test administered in 3 rd grade consisted of 41 multiple-choice items that assess 
reading comprehension on six reading passages. Passages were either short fictional stories 
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Table 1. Performance and Gender Makeup of Each Group 





Group 




A a 

(General Ed. 
No Accomm.) 


B b 

(Low Perf. 
No Accomm.) 


C c 

(IEP 

No Accomm.) 


D d 

(IEP with 
Accomm.) 


Communication Arts 8 










Mean 


31.0 


23.0 


21.6 


23.7 


SD 


7.13 


8.05 


7.76 


7.50 


N 


1002 


995 


.600 


661 


% Female 


49 


45 


34 


33 


Mathematics' 










Mean 


24.9 


19.2 


18.9 


20.1 


SD 


5.13 


6.23 


6.34 


5.99 


N 


1139 


1006 


831 


1082 


% Female 


49 


50 


33 


32 



a general education students taking the test without accommodations 

b low performing non-disabled students taking the test without an accommodation 

c IEP students taking the test without an accommodation 

d IEP students taking the test with the read-aloud accommodation 

6 highest possible score = 41 

1 highest possible score = 32 



with fewer than 250 words or short poems. Students were required to answer both literal and 
inferential test items. 

The mathematics test that is administered in 4 th grade consisted of 32 multiple-choice items. 
Test items were chosen so that the following five areas were represented: number sense, 
geometric/spatial sense, patterns and relationships, mathematical systems and number theory, 
and discrete mathematics. 



Procedures* 



The primary objective of this study was to ascertain the effect that the read-aloud accommodation 
had on the validity of math test scores and reading comprehension test scores. It is usually 
assumed that achievement tests (e.g., a math test) measure a single ability, and the ability that is 
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measured is the same in each sub-population. When a test measures a different ability in two 
sub-populations, score interpretation becomes very difficult. It does not mean that the test is 
valid in one population but not in the other, only that the meaning of the scores is different 
across populations. This difference makes it difficult to discuss the effects of various conditions 
on test score validity. The resolution is to select one group as the standard against which the 
interpretation of the validity of scores for other groups is described. This standard group is 
called the reference group. 

Many methods have been developed to ascertain whether a test measures the same trait in two 
groups. The method chosen in this study uses item response theory. Item response theory 
constitutes a set of mathematical models and assumptions that define the probability of getting 
an item correct as a function of an examinee’s ability and item characteristics. When the 
assumptions hold and the model fits the data, the item characteristics are not dependent on the 
characteristics of a particular population. One such item characteristic is item difficulty. 

It has been argued that an item difficulty difference of sufficient magnitude indicates that the 
item is measuring a different ability for the two groups. These items are referred to as DIF 
(differential item functioning) items. The presence of many DIF items suggests that the test 
measures different abilities for different groups of examinees. This study compared the item 
difficulty estimates from each of the four groups defined above. The question to be answered 
was whether item difficulty estimates generated using a unidimensional three-parameter logistic 
model were substantially different for students receiving the read-aloud accommodation 
compared to students who took the test without an accommodation. 

Accommodations research using extant data is complicated by the fact that the use of test 
accommodations is confounded by the presence of a disability and low achievement. Students 
without a disability usually do not receive accommodations and score well-above students with 
disabilities. Thus, students receiving accommodations typically differ from non-accommodated 
students in three ways: (1) lower performance, (2) the presence of a disability, and (3) the 
presence of an accommodation. Identifying four groups of examinees allows us to begin to 
isolate the effect of the read-aloud accommodation. First, a random sample of students without 
disabilities who do not use accommodations was selected to serve as the reference group (Group 
A). The item structure for this group serves as the standard for what the test was designed to 
measure. A random sample of low ability students without disabilities was used to study the 
effect for the overall performance difference (Group B). A group of students with an IEP in 
reading who took the test under standard conditions was used to study the effect that the disability 
has on the construct being measured (Group C). Item difficulty differences between Group C 
and Group A represent the effect of the disability plus the effect for the overall performance 
difference; therefore, the effect found for the former group must be subtracted from the effect 
for this group in order to study the effect for the disability. Last, a group of students with an IEP 

er|c " CE0 7 



13 



in reading who received the read-aloud accommodation (Group D) was used to study the effect 
that the accommodation had on the construct being measured. Item difficulty differences between 
Group D and Group A include any effect of the accommodation plus the effect of the disability 
plus the effect of the overall performance difference; therefore, the effect attributed to Group C 
must be subtracted from the effect for Group D in order to study the effect for the accommodation. 

The program, BILOG-MG, was used to fit the three-parameter logistic model to the data 
(Zimkowskj, Muraki, Mislevy, & Bock, 1996). BILOG-MG makes it possible to produce item 
difficulty estimates on several groups simultaneously; all that is required is to define one group 
as the reference group and the other groups as focal groups. The reference group is used to set 
the location (mean) and the metric (standard deviation) of the item difficulty scale. Item difficulty 
estimates for each focal group are placed onto the scale of the reference group by applying two 
sets of constraints. First, BILOG-MG constrains the item discrimination and pseudo-guessing 
parameters to be equal across groups for each item. Second, it constrains the sum of the item 
difficulty estimates in the focal groups to equal the sum of the item difficulty estimates in the 
reference group. After applying these constraints, the resulting item difficulty estimates are on 
a common scale and mathematical operations can be applied to them. 

The effect that the overall performance difference, the presence of the disability, and the use of 
the read-aloud accommodation had on item difficulty can be ascertained by comparing the item 
difficulty difference between conditions (focal group item difficulty minus reference group 
item difficulty). Because the difference may be either negative or positive, it is necessary to 
square the difference so that the sum will produce a positive number that indicates the overall 
magnitude across all test items. The square root of the average squared discrepancy is referred 
to as the root-mean-squared discrepancy (RMSD). The formula for the RMSD is shown here: 



where j represents the j 1 * 4 item, k is the number of items on the test, b. p is the logit item difficulty 
estimate on the j* item for the focal group and b. R is the logit item difficulty estimate on the j* 
item for the reference group. The RMSD gives a measure of the overall item difficulty difference 
across all test items. A large RMSD suggests that the items operated differently in the focal 
group compared to the reference group. 

The magnitude of the RMSD is a function of the effect of interest as well as measurement and 
sampling error. One would expect the RMSD to be some value greater than zero even between 
two randomly equivalent groups. An estimate of the amount of the RMSD that could be attributed 
to estimation error was ascertained by computing the RMSD between two random samples of 
approximately 1000 general education students. This value is analogous to the error term in the 
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denominator of a t-test. A judgment of statistical significance could be made by computing the 
ratio of the RMSD from one of the focal groups to the RMSD that results from estimation error. 

Another way to ascertain the magnitude of each effect would be to compare the fit of the 3- 
parameter logistic (PL) model when the item difficulties are constrained to be equal across 
groups versus the model in which item difficulties are allowed to vary across groups. BELOG- 
MG calculates the -2 log likelihood statistic, which can be used to judge the goodness of fit of 
the model to the data. In order to ascertain the magnitude of each effect, one analysis was 
conducted in which the item difficulty estimates for the reference group (Group A) and each 
focal group (Group B, C, and D) were constrained to be equal across groups. A second analysis 
was conducted using the DIF option in BILOG-MG. The DIF option allows the item difficulties 
to vary across groups. The difference in the -2 log likelihood statistic between the two analyses 
provides an indication of the overall magnitude of the effect. Twelve analyses were conducted, 
two for each effect on both the reading and math test items. 

BILOG-MG computes the item difficulty difference across groups (b. F - b. R ) after the items 
have been rescaled to a common scale, and it generates the standard error of this difference. The 
standard error can be used to determine whether the difference statistically differs from zero 
(no difference). Items for which the ratio of the difficulty difference to the standard error exceeds 
2.0 indicate that the item is measuring something different in the two groups, which is known as 
differential item functioning, or DIF for short. The number of DIF items was also used to 
ascertain the magnitude of the effect of each condition; the presence of many DIF items is 
compelling evidence that a test functions differently for two groups. 



Average Squared Discrepancy Results « ■— ™ > « — - - ■ 

Reading Test 

The effect of the read-aloud accommodation on item difficulty was described in three ways. 
First, discrepancies between the item difficulty estimates for Group A and each of the focal 
groups (Groups B, C, and D) were computed. Table 2 summarizes the results from the reading 
test. The table reports the median squared discrepancy and not the mean squared discrepancy 
because the distribution of the item difficulty differences was positively skewed. The median 
squared discrepancy between the low performing general education students (Group B) and the 
random sample of general education students (Group A) was only .009, thus indicating that the 
rescaled item difficulty estimates were very similar for these groups. Because these values are 
squared, it is necessary to take the square root to obtain the magnitude of the discrepancy; 
therefore, the median squared discrepancy for Group B corresponds to an average item difficulty 
difference between Group B and Group A of about .10 (i.e., the square root of .009). The median 
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Table 2. The Average Squared Discrepancy Between the Item Difficulty Estimates Obtained on the 
Focal Groups (Groups B, C, and D) and the Reference Group (Group A) on the Reading Test 



Squared Discrepancy 


Effect 


Median 


Maximum 


75 ,h Percentile 


Performance alone 1 


.009 


.099 


.021 


Perf. + Disability 2 


.042 


1.724 


.091 


Perf + Dsbl + Accommodation 3 


.086 


4.792 


.310 



1 Perf = Group B compared to Group A 

2 Perf + Disability = Group C compared to Group A 

3 Perf + Dsbl + Accommodation = Group D compared to Group A. 



squared discrepancy for the group of students with an IEP in reading taking the reading test 
without an accommodation (Group C) was .042, or four times greater than the median squared 
discrepancy between Group B and Group A. The students with an IEP in reading who took the 
reading test with the read-aloud accommodation (Group D) had a median squared discrepancy 
of .086, which was 10 times greater than the median squared discrepancy between Group B and 
Group A. 

The distribution of squared discrepancies was skewed, as evidenced by the magnitude of the 
maximum value compared to the value at the 75 th percentile. For Group C the value of squared 
discrepancy corresponding to the 75 Ih percentile was .091 on the reading test, whereas the 
maximum value was 1.72. In other words, the largest squared discrepancy (1.72) was 19 times 
larger than the squared discrepancy corresponding to the 75 th percentile. In Group D, the 
maximum squared discrepancy was 4.79, which was more than 15 times larger than the squared 
discrepancy corresponding to the 75 th percentile. A cursory examination of Appendix A indicates 
that the last three questions on the reading test were much more difficult for the students with an 
IEP in reading (groups C and D) than for the general education students. It is important to note 
that these differences are not the result of overall performance because the rescaling controls 
for overall performance differences. It is not clear why the last three items were so much harder 
for the students with an IEP in reading. An examination of frequency of not attempting each 
item indicates that this alone cannot account for the large difficulty difference. About 1% of the 
students in Groups A, B, and C did not attempt these items, compared to over 2% of the students 
in Group D. Removing the last three items from the results would dramatically reduce the 
amount of skew in the distribution of squared discrepancies. 
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Math Test 



Table 3 contains the summary of the squared discrepancies between the item difficulty estimates 
for each of the focal groups (Groups B, C, and D) and the reference group (Group A). The 
median squared discrepancy for Group B was quite small (.008), which is about the same 
magnitude as the median squared discrepancy for Group B on the reading test. A value this 
small indicates that the rescaled item difficulty estimates were similar between Group B and 
Group A. Unlike the results on the reading test, the median squared discrepancies for Group C 
and Group D were relatively small, .023 and .022 respectively. A squared discrepancy of this 
magnitude indicates that, on average, the item difficulty difference between the reference group 
and each of the focal groups was in the order of .10 to .20. The distribution of squared 
discrepancies was not as skewed on the math test as it was on the reading test, thus indicating 
that no item or subset of items would dramatically influence the results (see Appendix B). 



Table 3. The Average Squared Discrepancy Between the Item Difficulty Estimates Obtained on the 
Focal Groups (Groups B, C, and D) and the Reference Group (Group A) on the Math Test 



Squared Discrepancy 


Effect 


Median 


Maximum 


75 th Percentile 


Performance alone 1 


.008 


.425 


.039 


Perf. + Disability 2 


.022 


.521 


.124 


Perf + Dsbl + Accommodation 3 


.023 


.286 


.071 



' Pert = Group B compared to Group A 

2 Pert + Disability = Group C compared to Group A 

3 Pert + Dsbl + Accommodation = Group D compared to Group A. 
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Two fit indices were used to estimate the magnitude of each effect — the effect due to the 
performance difference, the effect due to the disability, and the effect due to the read-aloud 
accommodation. One index, called the root mean squared discrepancy, gauges the overall 
discrepancy between item difficulty estimates between the reference group and each focal group. 
When the model fits the data and there is no estimation error, the RMSD should equal zero. 
However, a portion of the variance in these item difficulty differences can be attributed to the 
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estimation error. The magnitude of the estimation error is a function of both the number and 
quality of the items, and the number of examinees. In order to estimate how much of the RMSD 
can be attributed to estimation error, the RMSD between two random samples of general 
education students each with about 1000 students was computed. The RMSD between these 
two samples provides an estimate of the amount of estimation error present in the other 
comparisons. 

The -2 log likelihood difference between the unconstrained and the constrained model was also 
used to evaluate each effect. For the unconstrained model, item difficulty was allowed to vary 
between the reference group and the focal group. For the constrained model, the item difficulty 
estimates were constrained to be equal across groups. A large -2 log likelihood indicates that 
the item difficulties are not the same between the focal group and the reference group. 

Table 4 displays the RMSD and the -2 log likelihood fit index (G 2 c - G 2 u ) for each focal group/ 
reference group comparison. The effects present in each comparison are shown in the left column. 
Each effect was isolated by subtracting the RMSD of the comparison that contains all of the 
effects except for the effect to be isolated. For instance, the effect due to the learning disability 
was calculated by subtracting the RMSD due to performance from the RMSD due to performance 
and disability. The statistical significance of this effect was determined by calculating the ratio 
of that difference to the RMSD attributable to estimation error. The ratio follows a t-distribution; 
therefore, an absolute value greater than 2.0 indicates a statistically significant effect. For a 



Table 4. The RMSD Between the Reference Group and the Focal Group, and the -2 Log 
Likelihood Fit Between the Constrained G 2 c and the Unconstrained model (G 2 y ) 





Reading 




Math 




Effect 


RMSD 


G 2 c-G 2 u 


RMSD 


G 2 c -G 2 u 


Estimation error 


.11 




.19 




Performance alone 


.13’ 


71 2 2 


.20 


614 


Perf + Disability 


.36 1 


843 2 


.22 


605 


Perf + Dsbl + Accommodation 


.67’ 


1261 2 


.30 


612 



' if the last three items were removed, the RMSD would be .15, 21 , and .38 respectively. 
2 The difference in model fit would be 671 , 767, and 685 respectively. 
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one-tailed hypothesis test with 40 degrees of freedom on the reading test, the critical t is 1 .684, 
and for the 31 degrees of freedom on the math test the critical t is 1 .696. 



Reading Test 

For the reading test, the effect due to estimation error resulted in an RMSD of .1 1. The effect 
due to the overall performance difference was .13, which is about the same magnitude as that 
due to estimation error, indicating that the effect was not significant. The RMSD was roughly 
three times larger (RMSD = .36) when the presence of the disability was added, and it was 
roughly six times larger (RMSD = .67) when the effect for the disability plus the effect for the 
accommodation was included. The effect for reading disability was .23 (.36 - .13) and it was 
significant (t = 2.09). The effect due to the read-aloud accommodation was .3 1 (.67 - .36), 
which is also significant (t = 2.82). 

The -2 log likelihood difference between the constrained and unconstrained model for each 
effect displayed a similar pattern of increasing magnitude. When the two groups (reference and 
focal) differ only on ability, the difference in the -2 log likelihood was 712, it increased to 843 
with the addition of the effect for disability, then jumped to 1261 with the addition of the effect 
for the read-aloud accommodation. The difference between the -2 log likelihood is distributed 
as a chi-square. The degrees of freedom for the chi-square test is equal to the number of items 
on the test. Each of the effects resulted in significantly poorer model fit. In other words, the 
item difficulties are not the same across groups. 

The last three items on the reading test had much larger item difficulty differences (Reference 
Group - Focal Group) than the other items. The item difficulty difference was -.83, -1.02, and 
—1.31 in the comparison between Group C and Group A, and it was -2.04, -1.91, and -2.19 in 
the comparison between Group D and Group A. In other words, these items were much harder 
for the two reading disability groups. The RMSD was recalculated without the last three items, 
with a result of a RMSD of .22 for Group C and .40 for Group D. The effect due to the presence 
of the reading disability was recalculated: t = [(.22-. 1 3)/. 1 1] = .82. 

After removing the last three items, the effect for reading disability was no longer significant. 
The effect due to the read-aloud accommodation was also recalculated and was not significant 
(t = [(.40-.22)/.l 1] = 1.64), although it was close to the critical value of 1.68. 
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Math Test 



Each of the effects was negligible on the math test. The magnitude of estimation error was .19. 
The RMSD for the effect due to the overall performance difference was .20. As with the reading 
test, the effect for the overall performance difference was not statistically significant. The presence 
of the performance difference plus the reading disability resulted in a RMSD equal to .22, and 
a t equal to .10. The result indicates that the presence of a reading disability does not significantly 
change the construct being measured. In the absence of a significant effect for the reading 
disability, it makes little sense to determine whether the use of the read-aloud accommodation 
significantly reduces the item difficulty discrepancies. However, the effect was computed to 
determine whether the presence of the read-aloud accommodation made matters worse. The t 
associated with the combined effect of the reading disability and the read-aloud accommodation 
was .53, not statistically significant. The magnitude of the difference in model fit ranged from 
605 for Group C to 614 for Group B. The fit statistic suggests that neither disability status nor 
accommodation status, nor the combination of the two, had an overall DIF effect. 



The third way in which the effect of each condition (i.e., the effect for ability differences, the 
effect for disability status, and the effect for the accommodation) was examined was to count 
the number of DIF items for each group. An item was considered a DIF item if the ratio of the 
item difficulty difference ([b f -b r ]/SE (bfbr) ) exceeded 2.0. Because this ratio is distributed as a t- 
distribution, a value of 2.0 indicates statistical significance for a two-tailed alpha equal to .05. 
Table 5 shows the number of DIF items for each group on each test. Only one item was flagged 
as DIF for Group B on the reading test, and none were flagged on the math test. Group C had 10 
DIF items on the reading test, and only one DIF item on the math test. Finally, Group D had 19 
DIF items on the reading test and six DIF items on the math test. 



The goal of this study was to ascertain whether the read-aloud accommodation altered the 
construct being measured by a reading test and a math test. Specifically, it was argued that item 
characteristics for students with a reading disability taking a math test and a reading test with 
the read-aloud accommodation should be the same as those for students without disabilities 
taking each test without the accommodation, whereas the item characteristics for students with 
a reading disability taking each test without the read-aloud accommodation should differ from 
the reference group. 
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Table 5. The Number of DIF Items 



Number of DIF items 


Group 


Reading 


Math 


B 


1 


0 


C 


10 


1 


D 


19 


6 



Because the students receiving the accommodation differed from the reference group on three 
conditions — ability level, disability status, and use of an accommodation — a direct comparison 
between the reference group and the group receiving the read-aloud accommodation would be 
confounded. It was necessary to include two additional comparison groups — one that estimates 
the effect of the ability difference, and one that estimates the effect of the combination of the 
ability and disability status. Although we did not have the advantage of random assignment to 
all conditions, by using three focal groups, the effect of the read-aloud accommodation could 
be estimated. 

The root-mean-squared-discrepancy was used to index the overall item discrepancy. A large 
RMSD suggests that the test items measure different constructs. The extent to which the test 
items measured the same construct for the three conditions was also evaluated by a count of the 
number of items that differed significantly from the estimate in the non-disabled, non- 
accommodated sample. These were called DIF items. 



Reading Test 

The results from the reading test indicated that item difficulty was substantially different from 
the reference group for students with a reading disability taking the test without an 
accommodation. In other words, in the absence of an accommodation, reading comprehension 
test scores will not mean the same thing for students with a reading disability as they do for 
other examinees. This type of evidence suggests that some alteration to the regular test is necessary 
to obtain comparable scores for students with and without a reading disability. 

Providing certain accommodations to students with a reading disability is one way to obtain 
comparable scores. A common accommodation is to read the test passages and items to the 
students. The results of this study do not support this approach. When the read-aloud 
accommodation was provided to students with a reading disability, the item difficulty differences 
actually increased. The RMSD was two times greater for the students receiving the read-aloud 
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accommodation than for students taking the test without it. Furthermore, the number of DIF 
items nearly doubled from 10 items for the students taking the test without the accommodation 
to 19 items for students taking the test with the read-aloud accommodation. 

Reading the reading test made a bad situation worse. Because students were not randomly 
assigned to conditions, these results cannot be used to declare that reading the reading test 
represents an invalid accommodation. However, this demonstrates that the score for students 
receiving this accommodation in an actual testing situation results in scores that do not mean 
the same thing as the scores for the students without disabilities who do not receive 
accommodations. 

Item difficulty differences on the reading test were most pronounced on the last three items. 
The item difficulty difference exceeded one standard deviation unit on each of these three items. 
The direction of the difference indicated that these items were much more difficult for the 
students with a reading disability, whether they had the accommodation or not. When these 
items were removed, and the RMSD was recomputed, the overall difference between the item 
difficulty estimates for students with a reading disability taking the test without an accommodation 
and the non-disabled reference group were no longer statistically significant. Removing these 
items resulted in a non-statistically significant effect for the reading disability. This implies 
that, after removing the. last three items, the overall test scores for students with a reading 
disability taking the test without an accommodation measure the same latent trait as test scores 
for the non-disabled, non-accommodated group. This was not the result for the students receiving 
the read-aloud accommodation. Even with the last three items removed, the RMSD still indicated 
that test scores were not a measure of the same latent trait as they were for students in the 
reference group. The presence of the read-aloud accommodation made things worse. 



Math Test 

Item difficulty estimates for students with a reading disability taking the math test without an 
accommodation did not differ significantly from the item difficulty estimates for the reference 
group. In all, only one item was identified as a DIF item. Overall, the item difficulty estimates 
were not significantly different from the reference group when students with a reading disability 
took the math test using the read-aloud accommodation. These results suggest that test scores 
for the students with a reading disability measured the same trait as they did in the reference 
group, regardless of whether the student with a disability received the accommodation. However, 
6 of the 31 items were identified as DIF items. This finding indicates that the read-aloud 
accommodation altered the characteristics of some of the items. Four of the six items were 
statistically easier for the students receiving the accommodation than for the reference group 
after controlling for the overall performance difference. If these items were word problems, it 
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might suggest that the accommodation removed extraneous difficulty due to the reading load of 
the item. This study cannot verify this, but does point to the need to attach DIF results to item 
characteristics. 

The results from the math test provide a mixed picture. On the one hand, the findings indicate 
that the use of the read-aloud accommodation did not significantly alter item difficulty estimates. 
However, not reading the math test to students with a reading disability did not result in poor 
measurement either. Since the items seem to function the same for students with a reading 
disability without an accommodation as they do for general education students, one might wonder 
what the point would be in providing the read-aloud accommodation. Without the 
accommodation, only one item was flagged as DIF; with the accommodation, six items were 
flagged as DIF. It appears that providing the read-aloud accommodation to students with a 
reading disability makes a good situation bad. 

One possible explanation for why the read-aloud accommodation resulted in more DIF items is 
that administration of the read-aloud accommodation may have been flawed. Also, it may be 
that the read-aloud accommodation was not uniformly administered across schools. Non-uniform 
administration would introduce additional random error to the test score, which could result in 
less stable item difficulty estimates. One can easily imagine ways in which the read-aloud 
accommodation could differ from test proctor to test proctor. For instance, one proctor may 
read with more pronounced voice inflexions and examinees could construe them to represent a 
clue to the answer. This could also detract the examinee from solving the problem by herself or 
himself, and instead focus on unreliable cues. Other distractions may pose more of a problem 
when a proctor reads test material than when a student reads it. If one becomes distracted while 
reading, one could reread the material preceding the distraction. However, one does not have 
this luxury when a proctor reads the material. Presentation of the read-aloud via recording with 
a well-trained reader may mitigate these problems. 

Another potential problem can be attributed to group presentation of the read-aloud 
accommodation. In Missouri, the read-aloud is administered by having a proctor read the item 
and response choices to a small group of students or to individual students. A student may ask 
the proctor to re-read the problem. However, social pressures may limit these requests even if 
the students or group of students did not attend well to the initial reading. Because listening is 
a more passive activity than reading, it is easy to imagine that a student may not be attending 
well to the proctor, catching a student “off-guard” will likely result in that student missing the 
item. Furthermore, it is likely that students are aware of the behavior of other students in the 
room. When the correct answer is read, a student may watch how the other students react to 
each response option, and then make their choice when the other students seem to be marking 
their choice. 
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There also is the possibility that the read-aloud accommodation was not given to the students 
who actually needed it and was given to students who did not need it. This possibility seems 
unlikely given that the analysis only included students with a reading disability, however we 
have no basis for knowing whether some students with a reading disability might need an 
accommodation while others with the same disability might not. Presumably students with a 
reading disability should benefit from the accommodation, whereas students without a reading 
disability should not. It is possible that the identification of students with a reading disability is 
inadequate. 

There is also the possibility that the read-aloud accommodation is itself flawed, or that it is 
simply unnecessary for the vast majority of students with reading disabilities. Our findings 
indicate that the effect for the reading disability was not significant on the math test; additionally, 
only one DIF item was identified. These results indicate that the construct of the math test was 
not affected by the presence of the disability; thus, there was no reason to accommodate those 
students. Similarly, after removing the last three items on the reading test, the effect due to the 
reading disability was not significant. That is, reading comprehension test items tapped the 
same construct for students with a reading disability as they did for the students without the 
reading disability. 

Our findings indicate that either the read-aloud accommodation was unnecessary for students 
with reading disabilities taking a 3 rd grade reading comprehension test and a 4 th grade math test, 
or that only a subgroup of these students would benefit from the accommodation. This study is 
just a first step in establishing the effect of the read-aloud accommodation on the construct 
validity of a reading and math test. Strong conclusions about the validity of the read-aloud 
accommodation itself will require the accumulation of evidence from many studies, including 
more analysis of extant data. 
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Appendix A ■ 

Reading Test Adjusted Threshold Values 



ITEM GROUP 





A 


C 


D 


B 


R01 


| -3.574 | 


-3.971 | 


-4.155 | 


-3.649 




1 0.272* | 


0.228* | 


0.2 4 9* | 


0.207* 


R02 


| -3.220 | 


-3.429 | 


-3.082 | 


-3.312 




1 0.223* | 


0.166* | 


0.172* | 


0.149* 


R03 


| -0.614 | 


-0.715 | 


-0.646 | 


-0.608 




I 0.128* | 


0.140* | 


0.130* | 


0.120* 


R04 


| -1.843 | 


-2.132 | 


-2.427 | 


-1.818 




1 0.302* | 


0.288* | 


0.293* | 


0.268* 


R05 


| -1.511 | 


-1.421 | 


-1.085 | 


-1.666 




| 0.126* | 


0.110* | 


0.095* | 


0.098* 


R06 


| -1.606 | 


-1.913 | 


-1.991 | 


-1.647 




I 0.155* | 


0.155* | 


0.14 6* | 


0.129* 


R07 


| -1.378 | 


-1.552 | 


-1.504 | 


-1.569 




| 0.138* | 


0.130* I 


0.121* | 


0.116* 


R08 


| -1.667 | 


-1.636 | 


-1.881 | 


-1.850 




| 0.119* | 


0.111* | 


0.111* | 


0.102* 


R09 


| -0.582 | 


-0.476 | 


-0.646 | 


-0.654 




| 0.077* | 


0.089* | 


0.079* | 


0.069* 


RIO 


| -3.005 | 


-3.305 | 


-3.278 | 


-3.104 




j 0.220* | 


0.181* | 


0.192* t 


0.164* 


Rll 


| 0.194 | 


-0.158 | 


-0.811 | 


0.189 




I 0.092* | 


0.130* | 


0.111* | 


0.115* 


R12 


| 0.106 | 


-0.098 | 


-0.506 | 


-0.001 




I 0.046* | 


0.076* | 


0.057* | 


0.061* 


R13 


| -0.799 | 


-0.549 | 


-0.571 | 


-0.536 




I 0.130* | 


0.137* | 


0.124* | 


0.113* 


R14 


| 0.758 | 


0.591 | 


0.681 | 


0.652 




I 0.063* | 


0.154* | 


0.140* | 


0.111* 


R15 


| -0.990 | 


-1.017 | 


-1.267 | 


-0.994 




| 0.093* | 


0.091* | 


0.086* | 


0.077* 


R16 


| -1.244 | 


-1.203 | 


-1.261 | 


-1.153 




| 0.0 93* | 


0.086* | 


0.078* | 


0.072* 


R17 


| 0.020 | 


-0.072 | 


-0.212 | 


0.334 




| 0.121* | 


0.162* | 


0.140* | 


0.147* 


R18 


| -1.344 | 


-1.058 | 


-0.975 | 


-1.235 




| 0.127* | 


0.112* i 


0.104* | 


0.100* 


R19 


| 0.421 | 


0.441 | 


1.036 | 


0.564 




| 0.049* | 


0.141* | 


0.218* | 


0.105* 


R20 


| -1.729 | 


-1.994 | 


-1.785 | 


-1.843 




| 0.133* | 


0.124* | 


0.116* | 


0 . 107* 


R21 


| 0.278 | 


-0.116 | 


-0.538 | 


0.422 




| 0.055* | 


0.086* | 


0.069* | 


0.089* 



i r|( ceo 
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21 



ITEM 



A 



GROUP 

c 



D 



B 



R22 


| -0.052 | 


-0.378 | 


-0.270 | 


-0.119 




I 0.058* | 


0.080* | 


0.075* | 


0.073* 


R23 


| -0.111 | 


-0.172 | 


-0.413 | 


-0.397 




I 0.085* | 


0.119* | 


0.101* | 


0.090* 


R24 


| -0.727 | 


-1.062 | 


-1.465 | 


-0.816 




| 0.079* | 


0.084* | 


0.085* | 


0.069* 


R25 


| 0.073 | 


0.411 | 


0.366 | 


0.290 




I 0.088* | 


0.164* | 


0.143* | 


0.117* 


R26 


| -0.292 | 


-0.418 | 


-0.784 | 


-0.132 




I 0.089* | 


0.10 9* | 


0.096* | 


0.095* 


R27 


| -0.348 | 


-0.491 | 


-0.501 | 


-0.315 




I 0.067* | 


0.082* | 


0.077* | 


0.069* 


R28 


| -0.698 | 


-0.969 | 


-0.980 | 


-0.604 




I 0.100* | 


0.108* | 


0.102* | 


0.092* 


R29 


| -1.479 | 


-1.490 | 


-1.361 | 


-1.431 




1 0.125* | 


0.112* | 


0.103* | 


0.097* 


R30 


| -0.596 | 


-0.714 | 


-0.928 | 


-0.673 




I 0.112* | 


0.121* | 


0.111* | 


0.101* 


R31 


| -0.659 | 


-0.865 | 


-1.060 | 


-0.797 




I 0.068* | 


0.073* | 


0.070* | 


0.061* 


R32 


| -0.932 | 


-0.856 | 


-1.263 | 


-0.958 




1 0.083* | 


0.079* | 


0.077* | 


0.068* 


R33 


| -0.484 | 


-0.738 | 


-0.943 | 


-0.467 




I 0.086* | 


0.096* | 


0.091* | 


0.082* 


R34 


| 0.407 | 


0.531 | 


0.333 | 


0.700 




I 0.063* | 


0.151* | 


0.10 9* | 


0.120* 


R35 


| -0.848 | 


-0.565 | 


-0.832 | 


-0.703 




| 0.087* | 


0.090* | 


0.080* | 


0.073* 


R36 


| -2.020 | 


-1.833 | 


-1.950 | 


-2.115 




I 0.286* | 


0.258* I 


0.257* | 


0.245* 


R37 


| -0.575 | 


-0.578 | 


-0.027 | 


-0.492 




1 0.150* | 


0.158* | 


0.160* I 


0.134* 


R38 


| -0.129 | 


0.012 | 


0.044 | 


-0.270 




1 0.087* | 


0.135* | 


0.118* | 


0.094* 


R39 


| 0.160 | 


0.985 | 


2.199 | 


0.054 




I 0.096* | 


0.245* | 


0.510* | 


0.119* 


R40 


| -2.051 | 


-1.030 | 


-0.137 | 


-1.959 




| 0.187* | 


0.140* | 


0.144* | 


0 . 135* 


R41 


| -1.348 | 


-0.035 | 


0.841 | 


-1.355 




1 0.182* | 


0.187* | 


0.241* | 


0 . 146* 
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Math Test Adjusted Threshold Values 



ITEM GROUP 







A 




c 




D 




B 


M01 


1 -4 


.411 | 


-4 


.030 | 


-3 


.764 | 


-3 


.996 




1 o 


.504* | 


0 


.418* | 


0 


.413* | 


0 


.415* 


M02 


1 -2 


.524 | 


-2 


.378 | 


-2 


.006 | 


-2 


.533 




1 o 


.323* | 


0, 


.280* | 


0 


.267* | 


0 


.283* 


M03 


1 -0 


.166 | 


0 


.095 | 


0 


.553 | 


-0 


.167 




1 o 


.173* | 


0, 


.200* | 


0 


.201* | 


0 


.178* 


M04 


1 -3 


904 | 


-4 


.203 | 


-4 


.172 | 


-4 


.561 




1 o 


.377* | 


0, 


.32 8* | 


0 


.332* | 


0 


.339* 


M05 


1 -3 


.396 | 


-3 


. 998 | 


-3 


.898 | 


-3 


.631 




1 o 


.586* | 


0, 


.570* | 


0 


.566* | 


0 


.554* 


MO 6 


1 -1 


.179 | 


-1 


.105 | 


-1 


.151 | 


-1 


.113 




1 o 


.131* | 


0 , 


.124* | 


0, 


.111* | 


0 


.118* 


MO 7 


1 -0 


.852 | 


-0 


.762 | 


-0, 


. 653 | 


-0 


.866 




1 o 


.073* | 


0 , 


.073* | 


0, 


.063* I 


0 


.068* 


MO 8 


1 -1 


. 691 | 


-1. 


.572 | 


-1, 


.442 | 


-1 


.866 




1 0, 


.177* | 


0, 


.160* | 


0, 


.144* | 


0 


.162* 


MO 9 


1 -0 


.611 | 


-0 


.816 | 


-1 


.029 | 


-0 


.671 




1 o, 


.101* | 


0, 


.108* | 


0, 


.094* | 


0, 


.102* 


MIO 


1 o. 


. 021 | 


-0 


.053 | 


0 


.044 | 


0 


.058 




1 o . 


.0 93* | 


0. 


.118* | 


0, 


.105* | 


0, 


.109* 


Mil 


1 0. 


.383 | 


0. 


.313 | 


0. 


.502 | 


0 


.420 




1 o . 


.088* | 


0, 


.134* | 


0. 


.123* I 


0, 


.125* 


M12 


1 “1 


.431 | 


-0. 


.975 | 


-0. 


. 924 | 


-1, 


.156 




1 o . 


.122* | 


0. 


.105* | 


0. 


.094* | 


0. 


.100* 


M13 


1 -2. 


.401 | 


-2 


.718 | 


-2 


.550 | 


-2 


.771 




1 o. 


.228* | 


0. 


.211* | 


0. 


,205* | 


0, 


.211* 


Ml 4 


1 -2 , 


.710 | 


-2 


.336 | 


-2. 


, 628 | 


-2 


.660 




1 o. 


.201* | 


0. 


.160* I 


0. 


,165* | 


0. 


.163* 


Ml 5 


1 “1. 


.569 | 


-1. 


.732 | 


-1. 


,866 | 


-1. 


.864 




1 o . 


.143* | 


0. 


.134* | 


0. 


,128* | 


0. 


.129* 


Ml 6 


1 -2. 


. 615 | 


-2. 


,440 | 


-2. 


, 667 | 


-2 


.785 




i o . 


.180* | 


0. 


,158* | 


0. 


,162* I 


0. 


.163* 


Ml 7 


1 -1. 


. 911 | 


-1. 


858 | 


-1. 


,856 | 


-1. 


.962 




1 o. 


.166* | 


0. 


,150* | 


0. 


,141* | 


0. 


.146* 


Ml 8 


1 -1. 


.492 | 


-1. 


623 | 


-1. 


590 | 


-1. 


,401 




1 o . 


.218* | 


0. 


,205* I 


0. 


,194* | 


0. 


.196* 


Ml 9 


1 -1 ■ 


201 | 


-1. 


260 | 


-1. 


455 | 


-1. 


050 




1 o. 


,106* | 


0. 


,100* | 


0. 


094* | 


0. 


.094* 


M20 


1 -1 ■ 


527 | 


-1. 


506 | 


-1. 


471 | 


-1. 


532 




1 o. 


,120* | 


0. 


,108* | 


0. 


096* | 


0. 


,101* 


M21 


1 -0. 


052 | 


-0. 


101 | 


-0. 


237 | 


-0. 


072 




1 o. 


,077* | 


0. 


,100* | 


0. 


081* | 


0. 


091* 


M22 


1 -0. 


542 | 


-0. 


834 | 


-0. 


967 | 


-0. 


733 




1 o. 


,124* | 


0. 


,130* | 


0. 


118* | 


0. 


125* 



er|c CE0 



29 



23 



ITEM 


A 


C 


GROUP 

D 


B 


M2 3 


| -0.255 | 


-0.240 | 


-0.131 | 


-0.256 




| 0.088* | 


0.104* | 


0.092* | 


0.096* 


M24 


| 0.122 | 


-0.086 | 


-0.139 | 


0.211 




| 0.049* | 


0.068* | 


.0.056* | 


0.069* 


M25 


| 0.006 | 


-0.188 | 


-0.125 | 


0.199 




| 0.066* | 


0.084* | 


0.073* | 


0.087* 


M26 


| -1.235 | 


-1.087 | 


-1.163 | 


-1.218 




| 0.092* | 


0.084* | 


0.074* | 


0.079* 


M27 


| -0.470 | 


-0.566 | 


-0.604 | 


-0.364 




| 0.0 65* | 


0.073* | 


0.061* I 


0.072* 


M28 


| -1.287 | 


-1.123 | 


-1.407 | 


-1.243 




| 0.131* | 


0.117* | 


0.109* | 


0.112* 


M29 


| 0.466 | 


0.491 | 


0.597 | 


0.765 




| 0.057* | 


0.115* | 


0.105* | 


0.119* 


M30 


| -0.755 | 


-0.772 | 


-0.885 | 


-0.546 




| 0.075* | 


0.078* | 


0.0 67* | 


0.075* 


M31 


| -1.143 | 


-0.914 | 


-0.885 | 


-0.971 




| 0.163* | 


0.154* | 


0.139* | 


0.148* 


M32 


| 0.230 | 


0.276 | 


-0.133 | 


0.237 




| 0.106* | 


0.145* | 


0.112* | 


0.132* 
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